pac-bayes framework
A Limitation of the PAC-Bayes Framework
PAC-Bayes is a useful framework for deriving generalization bounds which was introduced by McAllester ('98). This framework has the flexibility of deriving distribution-and algorithm-dependent bounds, which are often tighter than VC-related uniform convergence bounds. In this manuscript we present a limitation for the PAC-Bayes framework. We demonstrate an easy learning task which is not amenable to a PAC-Bayes analysis. Specifically, we consider the task of linear classification in 1D; it is well-known that this task is learnable using just $O(\log(1/\delta)/\epsilon)$ examples. On the other hand, we show that this fact can not be proved using a PAC-Bayes analysis: for any algorithm that learns 1-dimensional linear classifiers there exists a (realizable) distribution for which the PAC-Bayes bound is arbitrarily large.
Review for NeurIPS paper: A Limitation of the PAC-Bayes Framework
Weaknesses: The paper is technically heavy for my expertise, so I can only raise questions about its content. Might they be naive, discussing them in the paper would help other readers to understand the scope of this work. A first concern is about the fact that the paper presents solely (Theorem 1) the PAC-Bayes bound of McAllester (1999), converging at rate sqrt(1/m). Since this pioneer work, many variations on the PAC-Bayes bounds have been proposed. Notably, Seeger (2002)'s and Catoni (2007)'s bounds are known to converge at rate 1/m when the empirical risk is zero (see also Guedj (2019) for a up-to-date overview of PAC-Bayes literature).
A Limitation of the PAC-Bayes Framework
PAC-Bayes is a useful framework for deriving generalization bounds which was introduced by McAllester ('98). This framework has the flexibility of deriving distribution- and algorithm-dependent bounds, which are often tighter than VC-related uniform convergence bounds. In this manuscript we present a limitation for the PAC-Bayes framework. We demonstrate an easy learning task which is not amenable to a PAC-Bayes analysis. Specifically, we consider the task of linear classification in 1D; it is well-known that this task is learnable using just O(\log(1/\delta)/\epsilon) examples.
Tightening the Evaluation of PAC Bounds Using Formal Verification Results
Walker, Thomas, Lomuscio, Alessio
Probably Approximately Correct (PAC) bounds are widely used to derive probabilistic guarantees for the generalisation of machine learning models. They highlight the components of the model which contribute to its generalisation capacity. However, current state-of-the-art results are loose in approximating the generalisation capacity of deployed machine learning models. Consequently, while PAC bounds are theoretically useful, their applicability for evaluating a model's generalisation property in a given operational design domain is limited. The underlying classical theory is supported by the idea that bounds can be tightened when the number of test points available to the user to evaluate the model increases. Yet, in the case of neural networks, the number of test points required to obtain bounds of interest is often impractical even for small problems. In this paper, we take the novel approach of using the formal verification of neural systems to inform the evaluation of PAC bounds. Rather than using pointwise information obtained from repeated tests, we use verification results on regions around test points. We show that conditioning existing bounds on verification results leads to a tightening proportional to the underlying probability mass of the verified region.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
A Limitation of the PAC-Bayes Framework
The classical setting of supervised binary classification considers learning algorithms that receive (binary) labelled examples and are required to output a predictor or a classifier that predicts the label of new and unseen examples. Within this setting, Probably Approximately Correct (PAC) generalization bounds quantify the success of an algorithm to approximately predict with high probability. The PAC-Bayes framework, introduced in [24, 37] and further developed in [23, 22, 33], provides PACflavored bounds to Bayesian algorithms that produce Gibbs-classifiers (also called stochastic-classifiers). These are classifiers that, instead of outputting a single classifier, output a probability distribution over the family of classifiers. Their performance is measured by the expected success of prediction where expectation is taken with respect to both sampled data and sampled classifier. A PAC-Bayes generalization bound relates the generalization error of the algorithm to a KL distance between the stochastic output classifier and some prior distribution P. In more detail, the generalization bound is comprised of two terms: first, the empirical error of the output Gibbs-classifier, and second, the KL distance between the output Gibbs classifier and some arbitrary (but sample-independent) prior distribution. This standard bound captures a basic intuition that a good learner needs to balance between bias, manifested in the form of a prior, and fitting the data, which is measured by the empirical loss. A natural task is then, to try and characterize the potential as well as limitations of such Gibbs-learners that are amenable to PAC-Bayes analysis. As far as the potential, several past results established the strength and utility of this framework (e.g.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)